perm filename WHY[RDG,DBL] blob sn#552130 filedate 1980-12-15 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Trouble with proposed evolutionary task:
C00006 ENDMK
CāŠ—;
Trouble with proposed evolutionary task:
First, a digression, to explore the possibilities which would arise
from the following task:

You have been given n sequences of characters, and asked to find how closely
they are related, in pairs.
(Unbeknownst to you, these are Shakespearean plays.)

By blind search alone you may find "syntactic" matches - ie a particular
string of symbols seems to occur in some vaguely described context, with
a high frequency.  You might even discover that the most common symbol,
" ", appears, on the average, every 5 or so symbols, with a relatively
restricted range -- from a seperation of 1 to a maximum of 15.

You might also find characters, such as "." or ",", which seem to occur
only immediately before a " ", and never before one another, or that blank.
One could soon find select prefixes and suffixes (such as "ing" or "s");
and by subtracting these out, certain morphological roots may become apparent.

Given enough cycles, I'll even believe one could find parts of speech - eg
this "word" (now defined, using the delimiters shown about) is a noun, and can
therefore occur in the following places; analogously this verb can be modified
in this way.

The point to this discussion about is that this is ALL we'll get, given this
investigative framework.  Lexical and syntactic notions could be derived, 
but nothing with "semantic" content. 
It might deduce that Macbeth is more closely related to Henry IV, Part 2
than to the Merry Wives of Winsor because the tradegies/histories have more
occurances of words like "kill" - or it might be quite content to simply
observe that Falstaff and a few of his cronies seem to occur in HIV & MWW,
and reach the opposite (and here, it turns out, correct) conclusion that
HIV and MWW had a common root, in Shakespeare's mind.

Clearly this matcher would be incredibly more powerful if it had some
knowledge of, for example, scripts or plans - as well as an elaborate
lexicon relating (at minimum) synonyms to one another -- an more complete
semantic net would be yet even more useful.

Things at this level, in the field of Molecular Genetics, would be things
like chemical pathways, or various levels of functionality of the eventual
proteins, or ... things like coding region endmarks, while deducible, would
be useful - match this with " ".